AIbase
Home
AI Tools
AI Models
MCP
AI NEWS
EN
Model Selection
Tags
High-Precision Action Recognition

# High-Precision Action Recognition

Xclip Base Patch16 Kinetics 600 16 Frames
MIT
X-CLIP is an extension of CLIP for general video-language understanding, supporting zero-shot, few-shot, or fully supervised video classification, as well as video-text retrieval tasks.
Text-to-Video Transformers English
X
microsoft
393
2
Xclip Base Patch16 16 Frames
MIT
X-CLIP is a minimalist extension of CLIP for general video-language understanding, trained via contrastive learning on (video, text) pairs.
Text-to-Video Transformers English
X
microsoft
1,034
0
Xclip Base Patch32
MIT
X-CLIP is an extended version of CLIP for general video-language understanding, trained on (video, text) pairs via contrastive learning, suitable for tasks like video classification and video-text retrieval.
Text-to-Video Transformers English
X
microsoft
309.80k
84
Videomae Large Finetuned Kinetics
VideoMAE is a self-supervised video pre-training model based on masked autoencoder, fine-tuned on the Kinetics-400 dataset for video classification tasks.
Video Processing Transformers
V
MCG-NJU
4,657
12
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
English简体中文繁體中文にほんご
© 2025AIbase